If a BOX= input data set to PROC BOXPLOT includes one or more block variables, the blocks will be incorrectly displayed in the plot. There will be a misalignment of the blocks and the box-and-whisker plots. For example, the last box in each block may be displayed as the first box in the following block. If a SYMBOL variable is specified with associated SYMBOLn statements, then the plotted symbols will also be misaligned. The first group variable level uses the SYBMOL2 instead of the SYMBOL1 statement, the second group variable level uses the SYMBOL3 statement, etc. The incorrectly shifted symbols will also be displayed if the default symbols are used.
The HISTORY= data set does not have any of these problems and can be used as a workaround.
The following statements create a SAS data set containing diameter measurements for a part produced on three different machines:
data Parts; length Machine $ 4; label Sample = 'Sample Number' Machine = 'Machine'; input Sample $ Machine $ @; do i= 1 to 4; input Diameter @; output; end; drop i; datalines; 1 A386 4.32 4.55 4.16 4.44 2 A386 4.49 4.30 4.52 4.61 1 A455 4.45 4.56 4.38 4.51 2 A455 4.62 4.67 4.70 4.58 1 C334 4.16 4.28 4.31 4.59 2 C334 4.14 4.18 4.08 4.21 ;
These statements create a box plot for the measurements in the Parts data set grouped into blocks by the block variable Machine:
symbol1 c=red v=dot; symbol2 c=blue v=star; title 'Box Plot for Diameter Grouped By Machine'; proc boxplot data=Parts; plot Diameter*Sample (Machine)=Sample / blockpos=3; run;
The blocks and symbols are correct here, with two Samples per Machine:
The following statements create a BOX= SAS data set of the same PARTS data for input to PROC BOXPLOT. This is a form of pre-summarized data containing one box-and-whisker plot summary statistic or outlier value per observation.
data BoxData; length _type_ $8; _var_='Diameter'; do Machine='A386','A455','C334'; do sample='1','2'; do _type_='N','MIN','Q1','MEAN','MEDIAN','Q3','MAX','STDDEV'; input _value_ @@; output; end; end; end; datalines; 4 4.16 4.240 4.3675 4.380 4.495 4.55 0.16721 4 4.30 4.395 4.4800 4.505 4.565 4.61 0.13038 4 4.38 4.415 4.4750 4.480 4.535 4.56 0.07767 4 4.58 4.600 4.6425 4.645 4.685 4.70 0.05315 4 4.16 4.220 4.3350 4.295 4.450 4.59 0.18193 4 4.08 4.110 4.1525 4.160 4.195 4.21 0.05620 ;
Here is code to create the box plot using the pre-summarized BOX= data set:
title 'BOX= input data set'; proc boxplot box=BoxData; plot Diameter*Sample (Machine)=Sample / blockpos=3; run;
The plot should be the same as the one above, but the blocks and symbols are misaligned:
If only the pre-summarized data are available, the correct plot may be obtained by creating a HISTORY= data set instead of a BOX= data set. This type of input data set is structured to have one observation per individual box-and-whisker plot:
data HistData; do Machine='A386','A455','C334'; do sample='1','2'; input DiameterN DiameterL Diameter1 DiameterX DiameterM Diameter3 DiameterH DiameterS; output; end; end; datalines; 4 4.16 4.240 4.3675 4.380 4.495 4.55 0.16721 4 4.30 4.395 4.4800 4.505 4.565 4.61 0.13038 4 4.38 4.415 4.4750 4.480 4.535 4.56 0.07767 4 4.58 4.600 4.6425 4.645 4.685 4.70 0.05315 4 4.16 4.220 4.3350 4.295 4.450 4.59 0.18193 4 4.08 4.110 4.1525 4.160 4.195 4.21 0.05620 ;
title 'HISTORY= input data set'; proc boxplot history=HistData; plot Diameter*Sample (Machine)=Sample / blockpos=3; run;
Now the box plot blocks and symbols are correct:
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | z/OS | 9.1 TS1M0 | 9.3 TS1M0 |
Microsoft® Windows® for 64-Bit Itanium-based Systems | 9.1 TS1M0 | 9.3 TS1M0 | ||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | 9.1 TS1M0 | 9.3 TS1M0 | ||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | 9.1 TS1M0 | 9.3 TS1M0 | ||
Microsoft Windows 2000 Advanced Server | 9.1 TS1M0 | |||
Microsoft Windows 2000 Datacenter Server | 9.1 TS1M0 | |||
Microsoft Windows 2000 Server | 9.1 TS1M0 | |||
Microsoft Windows 2000 Professional | 9.1 TS1M0 | |||
Microsoft Windows NT Workstation | 9.1 TS1M0 | |||
Microsoft Windows Server 2003 Datacenter Edition | 9.1 TS1M0 | 9.3 TS1M0 | ||
Microsoft Windows Server 2003 Enterprise Edition | 9.1 TS1M0 | 9.3 TS1M0 | ||
Microsoft Windows Server 2003 Standard Edition | 9.1 TS1M0 | 9.3 TS1M0 | ||
Microsoft Windows XP Professional | 9.1 TS1M0 | 9.3 TS1M0 | ||
64-bit Enabled AIX | 9.1 TS1M0 | 9.3 TS1M0 | ||
64-bit Enabled HP-UX | 9.1 TS1M0 | 9.3 TS1M0 | ||
64-bit Enabled Solaris | 9.1 TS1M0 | 9.3 TS1M0 | ||
HP-UX IPF | 9.1 TS1M0 | 9.3 TS1M0 | ||
Linux | 9.1 TS1M0 | 9.3 TS1M0 | ||
OpenVMS Alpha | 9.1 TS1M0 | 9.3 TS1M0 | ||
Tru64 UNIX | 9.1 TS1M0 | 9.3 TS1M0 |
Type: | Problem Note |
Priority: | high |
Topic: | Analytics ==> Descriptive Statistics Analytics ==> Exploratory Data Analysis Analytics ==> Statistical Graphics SAS Reference ==> Procedures ==> BOXPLOT |
Date Modified: | 2010-10-05 16:35:02 |
Date Created: | 2010-10-05 11:55:32 |